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Abstract 

Genome-wide association studies have been successful in identifying common variants that impact complex human 
traits and diseases. However, despite this success, the joint effects of these variants explain only a small proportion 
of the genetic variance in these phenotypes, leading to speculation that rare genetic variation might account for 
much of the 'missing heritability'. Consequently, there has been an exciting period of research and development 
into the methodology for the analysis of rare genetic variants, typically by considering their joint effects on complex 
traits within the same functional unit or genomic region. In this review, we describe a general framework for model- 
ling the joint effects of rare genetic variants on complex traits in association studies of unrelated individuals. 
We summarise a range of widely used association tests that have been developed from this model and provide an 
overview of the relative performance of these approaches from published simulation studies. 

Keywords: rare variant; burden test; dispersion test; statistical methodology; genome-wide association; whole-genome and 
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INTRODUCTION 

Genome-wide association studies (GWAS) have 
been extremely successful in identifying loci contri- 
buting to a wide range of complex human traits and 
diseases [1]. However, association signals in these loci 
are typically characterised by common lead single 
nucleotide polymorphisms (SNPs), each of modest 
effect, which when considered together account 
for only a small proportion of the genetic variance 
of the trait [2]. For example, the 180 reported loci 
for human height in the general population together 
explain no more than 1 0% of the genetic variance of 
the trait [3], whilst the joint effects of lead SNPs at 63 
established loci for type 2 diabetes account for less 



than 6% of the familial aggregation of the disease [4]. 
Although there may be many additional common 
SNPs with effects on complex traits that are too 
modest to have been discovered through current 
GWAS efforts [5], it seems unlikely that the 'com- 
mon disease, common variant' paradigm will be all 
encompassing. Consequently, there has been much 
recent debate as to the role that rare genetic vari- 
ation, often defined to have minor allele frequency 
(MAF) of less than 1%, might play in explaining the 
'missing heritabUity' of complex human traits [6, 7]. 

Rare genetic variants are hkely to have arisen 
from mutation events in the last 20 generations, 
and thus are more Hkely than common SNPs to be 
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ethnic specific or polymorphic in just one population 
[8]. They are also expected to have larger effects on 
complex traits than common variants because they 
wiU not have been subject to purifying selection after 
the recent expansion of the human population [9] . 
However, because of the low MAF, these effects are 
unlikely to be sufficiently large to be detected with 
the usual single SNP association tests used in the 
analysis of GWAS. Furthermore, traditional geno- 
typing platforms used in GWAS have primarily 
been designed to capture common SNPs, taking ad- 
vantage of the structure of linkage disequihbrium 
throughout the genome, but offer only poor cover- 
age of rare genetic variation [10]. 

The most comprehensive approach to assaying 
rare genetic variation is through large-scale 
re-sequencing studies [11]. With considerable 
improvements in the throughput and efficiency of 
these technologies, whole-genome or whole- 
exome re-sequencing in large sample sizes is increas- 
ingly becoming a realistic financial undertaking for 
many research groups. Furthermore, high-density 
reference panels from the 1000 Genomes Project 
Consortium, derived from large-scale re-sequencing 
efforts in multiple populations, provide a compre- 
hensive catalogue of genetic variation with MAF as 
low as 0.5% across ethnic groups, as well as many 
rarer variants [12, 13]. Such reference panels could be 
used to select rare variants for inclusion on custom- 
designed arrays, potentially with priority given to 
those with likely functional consequences, such as 
the lUumina Infinium HumanExome BeadChip, 
enabling cost-effective genotyping in the large 
sample sizes required for complex trait association 
studies. Furthermore, if samples have already been 
assayed with traditional GWAS arrays, imputation 
techniques can make use of this common SNP scaf- 
fold to predict genotypes at variants, including those 
of lower frequency, that are present in the higher 
density reference panel, incurring no additional 
cost, other than computation [14]. 

With the increasing availability of high-quahty 
data from large-scale re-sequencing, genotyping 
and imputation studies of complex human traits, 
there has been an exciting period of development 
of statistical methodology for the analysis of rare gen- 
etic variation from this 'next generation' of GWAS. 
These methods have primarily focused on the ana- 
lysis of rare variants within the same 'functional unit' 
(exon, gene or pathway) or genomic region, increas- 
ing power to detect association over single SNP 



approaches by considering their joint effects on com- 
plex traits. In this review, we describe a general 
framework for modeUing the joint effects of rare 
genetic variants on complex traits in association stu- 
dies of unrelated individuals. We summarise a range 
of widely used association tests that have been de- 
veloped from this model and provide an overview of 
the relative performance of these approaches from 
pubhshed simulation studies. 

METHODOLOGY FOR THE 
ANALYSIS OF RARE GENETIC 
VARIATION 

Consider a sample of unrelated individuals who have 
been typed for rare variants within some functional 
unit or genomic region. Within a generalised linear 
modelling (GLM) framework, we can model the 
phenotype, y,-, of the i'th individual as 

giE[y,]) = a +/(G,), 

where g{.) is the link function. In this expression, /"(.) 
is some function on the genotypes, G/, of the ith 
individual, typically coded as Gy= {0, 1, or 2} ac- 
cording to the number of minor alleles they carry at 
the jth variant. In an imputed GWAS, G,y is most 
often replaced by the expected genotype, i3[Gi,], 
under a dosage model. Specifically, 

E[Gij] = piji + 2pij2, 

where p,yi and p,y2 denote the imputed probabihties 
that the ith individual carries heterozygous and rare 
homozygous genotypes, respectively, at thej'th vari- 
ant. The properties of the rare variant association test 
are then determined by the form of the function/(.), 
as described in detail below. 

Most rare variant statistical methodologies have 
been developed for quantitative traits (identity link 
function) or dichotomous phenotypes (logistic link 
function). However, the GLM can also incorporate 
more complex phenotypes including categorical re- 
sponses and 'time to event' outcomes. Furthermore, 
the flexibility of the GLM framework facilitates in- 
corporation of covariates to allow for adjustment for 
confounders, including non-genetic risk factors and 
indicators of population structure. 

Burden tests 

Burden tests of association have been developed by 
modelling the effect of accumulations of minor 
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alleles at rare variants, referred to as the 'mutational 
load', within some functional unit or genomic 
region. Under this model, f{.) is a simple linear func- 
tion of the genotypes, G, given by 

where fi denotes the eifect on the trait (log-odds 
ratio for a dichotomous phenotype) of each copy 
of a minor allele at rare variants within the functional 
unit or genomic region, and O)/ G [0,1] corresponds 
to the weight given to the j'th rare variant. 
Consequently, each rare variant has the same direc- 
tion, but not necessarily the same magnitude, of 
effect on the phenotype. 

The simplest approach is to assume 'unit 
weighting', where Mj is an indicator variable, such 
that (lOj — 1 if the j'th rare variant is to be included 
in the analysis, and C0j=0 otherwise. This 'masking' 
scheme may reflect annotation and/ or frequency, so 
that only coding or non-synonymous variants are 
included in the analysis, for example, for some pre- 
specified MAF threshold. Such an approach has been 
implemented in GRANVIL [15, 16], where 

and W = J2i ^i- Furthermore, in GRANVIL, 
genotypes are recoded under a dominant model 
such that Xjj= 1 if Gy > 0, and Xjj= 0 otherwise, or 
by Xy=pip +Pij2 for an imputed GWAS, because the 
rare homozygous genotype is so infrequent. 
GRANVIL then uses a likelihood ratio test of the 
nuU hypothesis of no association, j6 = 0, of the trait 
with rare variants in the functional unit or genomic 
region. 

An alternative approach to modelling the muta- 
tional load of a functional unit or genomic region is 
to 'coUapse' rare variants into a 'super-aUele' such 
that 

f(Gd = l[j2j^jG:il 

where (^jGij] = 1 if ^y(WyGy>0, and 

/[^y &)jG,j] = 0, otherwise. This collapsing tech- 
nique has been implemented in a Fisher's exact test 
for a 2x2 contingency table for dichotomous 
phenotypes in CAST [17] and CCRaVAT [18], 
and in an ANOVA framework for quantitative 
traits in QuTie [18]. The combined multivariate 
and collapsing method extends this approach to 
allow for simultaneous analysis of multiple 



super- alleles in a regression framework [19]. In this 
context, each super-aUele might correspond to alter- 
native non-overlapping masking schemes for the 
same set of variants, for example, different MAF 
thresholds and/or annotation categories, or to vari- 
ants in different functional units or genomic regions. 

One of the disadvantages of the unit-weighting 
scheme described above is that a MAF threshold for 
inclusion of rare variants in the analysis must be spe- 
cified in advance. Setting the MAF threshold too 
low might exclude important causal variants from 
the burden test, thereby reducing power. 
However, on the other hand, setting the MAF 
threshold too high might result in inclusion of 
many non-causal variants in calculating the muta- 
tional load, again resulting in a decrease in power. 
To overcome this problem, the variable threshold 
method considers multiple masking schemes for the 
same set of variants in a given functional unit or 
genomic region on the basis of MAF [20]. This ap- 
proach has been motivated by the concept that there 
is some unknown 'optimal' MAF below which vari- 
ants are most likely to have a direct impact on com- 
plex traits. Consequently, a test of association of the 
trait with the super-aUele is performed at multiple 
MAF thresholds, with significance assessed by 
means of permutation. 

Under the unit-weighting model, aU rare variants 
included in the masking scheme are assumed to have 
the same magnitude of effect on the phenotype, as 
weU as the same direction. As an alternative, the 
Madsen and Browning weighting scheme [21] 
aUows lower-frequency variants to have a greater 
impact on the phenotype than on those that are 
more common, such that 

1 

VijC^ - 1j) 

where qj is the MAF of the j'th variant. The weighted 
sum statistic for dichotomous disease phenotypes 
makes use of this weighting scheme, based on the 
MAF in controls, to rank individuals according to 
their mutational load M, = Xl/ '^/^y func- 
tional unit or genomic region [21]. A WUcoxon 
test with permutation is then used to evaluate the 
significance of association by comparing ranks in 
cases and controls. The cumulative minor allele test 
provides a unified framework, allowing for general 
weighting schemes, taking account of both MAF and 
annotation [22]. 
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Generalised burden tests 

As described above, an implicit underlying assump- 
tion of burden tests is that of the same direction of 
effect on phenotype of all rare variants in the same 
functional unit or genomic region. To remove this 
restrictive assumption, Han and Pan [23] proposed 
the data adaptive sum test (aSUM), which redefines 
the weighting scheme as &)y=l if \yj\ < 0, and 
a)j=—l otherwise, where y,- is an estimate of the 
effect of the minor allele for the j'th variant on the 
phenotype from a single variant GLM, for example. 
Under this model, a score test of the nuU hypothesis 
of no association between the trait and rare variants 
in the functional unit or genomic region is given by 

where 

Ui = ivi - y) J2j ^J^'^'i ~ ^'ii^- 

In this expression, y is the mean trait across individ- 
uals, and qi is the MAF of the j'th rare variant. 
However, in aSUM, the same data are used to de- 
termine the weights, &),■, and to perform the score test 
of association. Consequently, the significance of the 
association is determined by permuting phenotypes, 
and recalculating weights and the test statistic across 
replicates. As an alternative, the data can be split, 
with weights derived in a training set and association 
testing undertaken in the remainder of samples, 
eliminating the need for computationally demanding 
permutations [24]. 

The aSUM test was extended by Hoffinan et al. 
[25] by means of a 'step-up' approach, which con- 
siders a more general weighting scheme, defined by 
(!L>j = aj^jVj. In this expression, Sj depends on the dir- 
ection of the effect of thej'th variant, as in the aSUM 
test. For dichotomous disease phenotypes, 8j— —1 if 
the j'th variant is more prevalent in controls than 
cases, and 8j = 1 otherwise, whilst for quantitative 
traits, denotes the sign of the correlation coefficient 
with the minor allele at the ;th variant. The quantity 
dj is a continuous weighting function for the jth vari- 
ant which could, for example, allow for Madsen and 
Browning weights [21]. Finally, fj is an indicator 
variable representing the masking scheme, taking 
the value f y = 1 if the j'th variant is included in the 
analysis, and i'i= 0 otherwise. This indicator variable 
could be defined to reflect annotation and/or fre- 
quency. In the 'step-up' approach, forward selection 



is used to identify the subset of variants that maximise 
the evidence of association with the trait. At each 
stage of this iterative process, the variant that maxi- 
mises the increase in the score statistic, X, is selected 
in the model and continued until no further variants 
increase the evidence of association. The significance 
of the association is then determined by permuting 
phenotypes and repeating the model selection in 
each replicate. 

Dispersion tests 

The aSUM and 'step-up' methods alleviate the re- 
strictive assumption of burden tests of the same dir- 
ection of effect of all rare variants on the trait within 
the same functional unit or genomic region, but re- 
quire permutation procedures to assess statistical sig- 
nificance, which may not be computationally feasible 
for genome -wide analyses in large samples. To over- 
come this limitation, dispersion tests consider a more 
general function /(.), given by 

where /Jy denotes the effect of each copy of the 
minor allele at the /th rare variant. Of course, for 
rare variants, the allelic effects, P, cannot be reliably 
estimated. Consequently, the sequence kernel asso- 
ciation test (SKAT) [26] makes the assumption that 
y6y ~ N{0,Tcaj^, where, as before, the weights (jOj 
denote the masking scheme, and now r is an un- 
known variance component parameter. Under the 
nuU hypothesis of no association between the trait 
and rare variation in the functional unit or genomic 
region, y6y = 0 for all j, and is thus equivalent to 
r=0. SKAT uses a variance-component score test, 
given by 

QsKAT = ^jSp 

where 

5/ = Gy(y/ - A), 

and /x = g~^{a) is the expected trait value under the 
nuU hypothesis of no association. In the special case 
of a dichotomous phenotype with no covariates, 
SKAT is equivalent to the C-alpha test [27]. 
QsKAT follows a weighted sum of /j distributions 
under the nuU hypothesis, the significance of 
which can be detemiined analytically, without the 
need for permutations. 

Burden and dispersion tests have been designed to 
test for association of rare variants in the same 
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functional unit or genomic regions under different 
models of the eifect of minor alleles on a complex 
trait, in particular, their direction of effect on pheno- 
type. In an attempt to develop an approach that 
would be applicable across a wider range of associ- 
ation models, Lee et al. [28] proposed a linear com- 
bination of burden and dispersion score tests, 
constructed within the SKAT analysis framework. 
More specifically, 

= (1 — p)QsKAT + pQburden, 

where 



Qbl 



mjSj] 



For a fixed mixture parameter, p, the test statistic Qp 
follows a weighted sum of distributions under 
the nuU hypothesis of no association. Alternatively, 
p can be treated as an unknown nuisance parameter, 
and a data-driven procedure, SKAT-O, used to 
evaluate significance, without the need for compu- 
tationally intensive permutations. A similar frame- 
work, combining a variance component and 
generalised burden test as independent score statistics, 
using Fisher's or Tippett's procedures, has been 
implemented in the Mixed effects Score Test 
(MiST) [29]. 

Adaptive clustering methods 

An alternative approach to allow for rare genetic 
variants within a functional unit or genomic region 
to have different direction and/or magnitude of 
effects on a complex trait is to make use of a 
kemel-based adaptive cluster (KBAC) [30], which 
categorises individuals according to 'genotype 
groups'. In general, there are y possible genotype 
groups across a set of J variants. However, for rare 
variants, most of these possible genotype groups wUl 
not be seen because of low MAF, and, instead, we 
observe only M + 1 patterns, denoted Pq, Pi, ... , 
Pm, where Pg represents a pattern of common 
homozygotes only. The advantage of KBAC is that 
the genotype pattems encompass a wide range of 
possible models of association; for example, allowing 
for interactions between rare variants that cannot be 
easily incorporated with simple linear functions, /(.). 
For KBAC, a kernel K„, is defined for each pattem 
P,,, of genotypes. Consequently, the function f{.) can 
be expressed as/(G,) = yK„„ where P,„ is the pattem 
of genotypes carried by the ith individual, and a 
score test of the null hypothesis of no association 
of the trait with rare variants in the functional unit 



or genomic region, y = 0, constructed for the speci- 
fied kernel. 

For dichotomous disease phenotypes, a hyper- 
geometric kernel is appropriate, and it is given by 



In this expression, N is the total number of individ- 
uals in the study, of which N,„ carry genotype pattern 
P„ across rare variants in the functional unit or 
genomic region, and iV^ and denote the same 
quantities, respectively, in cases. For this kernel, in 
the absence of covariates, the KBAC score test is 
given by 



Qkbac; — 



with significance assessed via permutation. 



POWER OF RARE VARIANT 
METHODS TO DETECT 
ASSOCIATION WITH COMPLEX 
TRAITS 

As described above, there is a huge range of meth- 
odology available to detect association of complex 
traits with rare genetic variation in the same func- 
tional unit or genomic region. The majority of 
methods have been developed in the flexible GLM 
framework, but impose different underlying models 
of association that would be expected to be most 
powerful when the specific modelling assumptions 
are correct. For example, we might expect that the 
burden tests will be most powerful when all rare 
variants, after application of an appropriate masking 
scheme, have the same direction of effect on the 
complex trait. However, dispersion tests would be 
expected to be more robust to neutral variants, or 
to those with opposite directions of effect on the 
trait. Consequently, it seems unlikely that there 
wiU be a single 'unifomily most powerful' rare vari- 
ant association test over all possible underlying gen- 
etic architectures. 

Ladouceur et al. provide one of the most compre- 
hensive evaluations of rare variant methodology to 
date [31]. They assess the comparative power of sev- 
eral burden tests, as well as SKAT and an adaptive 
clustering method (inspired by KBAC). They 
employ Sanger sequencing data from ~2,000 indivi- 
duals at seven genes, and simulate continuous traits 
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over a range of genetic models spanning diiferent 
hypotheses for the eifects of rare genetic variation 
in the genes. They also investigate the performance 
of rare variant methods for dichotomous traits by 
using 500 cases and 500 controls selected from the 
extremes of the distribution. As seen previously [32], 
the power across tests was found to be affected by the 
proportion of causal variants in a gene, as well as 
their effect sizes. While the power of tests on con- 
tinuous traits increased monotonicaUy with larger 
effect sizes, tests on dichotomous traits seemed to 
be less affected. The power of collapsing tests 
increased more sharply as the number of causal var- 
iants increased. The VT method outperformed alter- 
natives in scenarios where rarer variants had stronger 
effects, but only for continuous phenotypes. SKAT 
was found to be more powerful than alternatives 
when mixtures of deleterious and protective variants 
were driving the association, as expected. SKAT was 
also the most powerful approach when a combina- 
tion of common and rare variants was driving the 
association. 

Given that burden and dispersion tests appear to 
have differential advantages, tests combining the two 
approaches seem like an attractive alternative. 
Indeed, both SKAT-O and MiST have been 
reported as performing well under a range of phe- 
notypes with varying causal to total variant distribu- 
tions, irrespective of their direction of effect [28, 29]. 
However, these methods are stUl to be subjected to 
independent evaluation. A comparison of rare variant 
methods on larger (>1000) sample sizes would also 
be particularly informative, since most comparative 
studies to date [28, 29, 31, 32] have been conducted 
on smaller sample sizes than ongoing sequencing 
efforts. 

The power of rare variant association method- 
ology is also likely to vary according to the technol- 
ogy used to assay genetic variation. Magi et al. [16] 
undertook simulations to evaluate the relative per- 
formance of different design strategies to identify 
association of a quantitative trait with rare variants 
in a 50 kb gene using GRANVIL, including: (i) re- 
sequencing; (ii) genotyping of all variants present in a 
reference panel from the same population; and (iii) 
imputation of a GWAS scaffold of primarily 
common variants up to the reference panel using 
IMPUTE v2 [33]. They considered a model in 
which the expected trait value of an individual was 
increased by the presence of a minor allele at any 
causal variant in the gene. The trait association 



model was then parameterised in terms of (i) the 
maximum MAP of any causal variant in the gene; 
(ii) the total MAP of all causal variants in the gene; 
and (iii) their joint contribution to the trait variance. 
They also considered a range of sizes for the refer- 
ence panel, varying from 150 to 4000 individuals, 
reflecting current and future efforts from the 1000 
Genomes Project [13] and the UKIOK Project 
(www.uklOk.org). 

As expected, the most powerful strategy to 
detecting rare variant association was through re- 
sequencing, which, in the absence of calling and 
genotyping errors, provides a complete catalogue of 
genetic variation in the gene. However, a strategy of 
genotyping all rare variants present in a large, popu- 
lation-matched reference panel, results in a relatively 
small reduction in power. Rare variants not captured 
by the reference panel (such as private mutations or 
those of very low frequency) are less likely to have a 
major impact on the trait under their simulation 
model, and thus, would not be expected to lead to 
a dramatic reduction in power. In the same way, 
imputation of a GWAS scaffold up to a large, popu- 
lation-matched reference panel also retains much of 
the power of the re-sequencing strategy. Larger ref- 
erence panels provide more comprehensive coverage 
of a rare variation in the gene, and higher quality 
imputation, allowing recovery of genotypes at vari- 
ants with MAP as low as 0.3% [34]. However, im- 
putation of variation of lower MAP remains a 
considerable challenge, and it is not clear that the 
quality metrics used for common SNPs will be suf- 
ficient for removing poorly performing rare variants 
from downstream association analyses [35]. Por 
this reason, imputation can never replace the 'gold 
standard approach' to assaying rare genetic variation 
through re-sequencing, although it currently 
provides a financially feasible, complimentary strat- 
egy to detecting association with complex traits in 
the required large sample sizes at a fraction of 
the cost. 



DISCUSSION 

Statistical methodology for the analysis of rare genetic 
variation in the next generation of GWAS has been 
primarily developed in a flexible GLM framework, 
which can be appHed to directly assayed or imputed 
genotype data and to quantitative traits or dichotom- 
ous disease phenotypes. The majority of statistical 
methods can be classified as burden tests, which 
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assume the same direction of effect on the trait of all 
rare variants, dispersion tests, which allow for devi- 
ations from this unidirectional assumption, or a com- 
bination of the two approaches. The relative utihty 
and power of these approaches depend on: (i) the 
computational burden (e.g. the need for permutations 
to evaluate statistical significance); (ii) the reliability of 
annotation (e.g. identification of coding variation that 
is more likely to have functional consequences); and 
(iii) the alignment of modelling assumptions with the 
underlying genetic architecture of the trait (e.g. ro- 
bustness to neutral variants and an assumption of all 
causal alleles having the same magnitude and direction 
of eifect) . Simulations highhght that there is no uni- 
fomily most powerful approach but that methods that 
combine burden and dispersion tests are relatively 
robust to various underlying genetic architectures. 

Until recently, rare variant association studies 
have been limited to candidate genes (functional or 
positional in GWAS loci) because of the expense of 
re-sequencing in large sample sizes. Despite these 
constraints, confirmed rare variant associations in- 
clude: (i) plasma lipid concentrations with ABCAl, 
APOAl, LCAT, NPCILI and ANGPTL4 [36-38]; 
(ii) body mass index with monogenic obesity-related 
genes [39]; (iii) blood pressure with renal salt hand- 
ling genes [40]; (iv) hypertriglyceridemia with lipo- 
protein lipase [41]; (v) inflammatory bowel disease 
with NOD2 [42]; and (vi) type 2 diabetes with 
MTNRIB [43] . However, with recent improvements 
in the throughput and efficiency of re-sequencing 
technologies and advances in statistical methodology 
to allow imputation of existing GWAS scaflblds up 
to high-density reference panels, genotypes at rare 
genetic variants are becoming increasingly 
interrogated in the sample sizes required for complex 
human traits. Consequently, genome- and exome- 
wide analyses of rare genetic variation have identified 
novel genes implicated in high-density lipoprotein 
cholesterol [44], insulin processing and secretion 
[45], and type 2 diabetes [46]. 

Despite these success stories, further methodo- 
logical development to maximise the potential of 
next-generation GWAS to identify rare variant asso- 
ciations with complex human traits is required. 
Improved functional annotation and a better under- 
standing of the role of non-coding regulatory vari- 
ation (e.g., through the ENCODE Project 
Consortium [47]) wlU inform study design and 
define powerful weighting schemes for rare variant 
analyses. Methodology to enable meta-analysis of 



rare variant association tests [48—50], by combining 
summary statistics across GWAS, would be expected 
to increase power but may be complicated by the 
observation that lower-frequency variation is more 
likely to be population-specific and, thus, may not be 
shared between studies, particularly in a trans-ethnic 
context. Nevertheless, with continued methodo- 
logical development and increased availability of 
next-generation GWAS of rare genetic variation, 
the coming years offer an exciting opportunity to 
discover novel genes implicated in complex human 
traits and an improved understanding of the genetic 
architecture and pathophysiology of human disease, 
with the ultimate goal of developing effective clinical 
intervention, resulting in improved public health. 



FUNDING 

APM acknowledges financial support from the 
Wellcome Trust (grant numbers WT098017 and 
WT090532) 



References 

1. Hindorff LA, Sethupathy P, Junkins HA, ct al. Potential 
etiologic and functional implications of genome-wide asso- 
ciation loci for human diseases and traits. Proc Natl Acad Sci 
USA 2009;106:9362-7. 

2. Manolio TA, Collins FS, Cox NJ, et al. Finding the 
missing heritability of complex diseases. Nature 2009;461: 
747-53. 

3. Lango Allen H, Estrada K, Lettre G, ct al. Hundreds of 
variants clustered in genomic loci and biological pathways 
affect human height. Nature 2010;467:832-8. 

4. Morris AP, Voight BF, Teslovich TM, ct al. Large-scale 
association analysis provides insights into the genetic archi- 
tecture and pathophysiology of type 2 diabetes. Nat Genet 
2012;44:981-90. 

5. Yang J, Benyamin B, McEvoy BP, et al. Common SNPs 
explain a large proportion of the heritabiUty for human 
height. Nat Genet 2010;42:565-9. 



Key points 

• There has been recent speculation that rare genetic variants, 
typically defined to have a minor allele frequency of less than 1%, 
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